Understanding Editing Behaviors in Multilingual Wikipedia

نویسندگان

  • Suin Kim
  • Sungjoon Park
  • Scott A. Hale
  • Sooyoung Kim
  • Jeongmin Byun
  • Alice H. Oh
چکیده

Multilingualism is common offline, but we have a more limited understanding of the ways multilingualism is displayed online and the roles that multilinguals play in the spread of content between speakers of different languages. We take a computational approach to studying multilingualism using one of the largest user-generated content platforms, Wikipedia. We study multilingualism by collecting and analyzing a large dataset of the content written by multilingual editors of the English, German, and Spanish editions of Wikipedia. This dataset contains over two million paragraphs edited by over 15,000 multilingual users from July 8 to August 9, 2013. We analyze these multilingual editors in terms of their engagement, interests, and language proficiency in their primary and non-primary (secondary) languages and find that the English edition of Wikipedia displays different dynamics from the Spanish and German editions. Users primarily editing the Spanish and German editions make more complex edits than users who edit these editions as a second language. In contrast, users editing the English edition as a second language make edits that are just as complex as the edits by users who primarily edit the English edition. In this way, English serves a special role bringing together content written by multilinguals from many language editions. Nonetheless, language remains a formidable hurdle to the spread of content: we find evidence for a complexity barrier whereby editors are less likely to edit complex content in a second language. In addition, we find that multilinguals are less engaged and show lower levels of language proficiency in their second languages. We also examine the topical interests of multilingual editors and find that there is no significant difference between primary and non-primary editors in each language.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

The Workshops of the Tenth International AAAI Conference on Web and Social Media

As a global, multilingual project, Wikipedia could serve as a repository for the world’s knowledge on an astounding range of topics. However, questions of participation and diversity among editors continue to be burning issues. We present the first targeted study of participants at Greek Wikipedia, with the goal of better understanding their motivations. Smaller Wikipedias play a key role in fo...

متن کامل

Directions for Exploiting Asymmetries in Multilingual Wikipedia

Multilingual Wikipedia has been used extensively for a variety Natural Language Processing (NLP) tasks. Many Wikipedia entries (people, locations, events, etc.) have descriptions in several languages. These descriptions, however, are not identical. On the contrary, descriptions in different languages created for the same Wikipedia entry can vary greatly in terms of description length and inform...

متن کامل

Document Categorization using Multilingual Associative Networks based on Wikipedia

Associative networks are a connectionist language model with the ability to categorize large sets of documents. In this research we combine monolingual associative networks based on Wikipedia to create a larger, multilingual associative network, using the cross-lingual connections between Wikipedia articles. We prove that such multilingual associative networks perform better than monolingual as...

متن کامل

Temporal Motifs Reveal the Dynamics of Editor Interactions in Wikipedia

Wikipedia is a collaborative setting with both combative and cooperative editing. We propose a new method for investigating the types of editor interactions using a novel representation of Wikipedia’s revision history as a temporal, bipartite network with multiple node and edge types for users and revisions. From this representation we identify significant author interactions as network motifs ...

متن کامل

Multilingual Word Sense Disambiguation Using Wikipedia

We present three approaches to word sense disambiguation that use Wikipedia as a source of sense annotations. Starting from a basic monolingual approach, we develop two multilingual systems: one that uses a machine translation system to create multilingual features, and one where multilingual features are extracted primarily through the interlingual links available in Wikipedia. Experiments on ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره 11  شماره 

صفحات  -

تاریخ انتشار 2016